HYPERFLEET-752 | ci: Improve E2E CI Test deployment logic by yingzhanredhat · Pull Request #51 · openshift-hyperfleet/hyperfleet-e2e

yingzhanredhat · 2026-03-20T02:20:07Z

Summary by CodeRabbit

New Features
- CLI option to customize the debug log directory.
- Helper to capture and archive comprehensive debug artifacts on failure.
Improvements
- Automatic capture and archival of debug logs on deployment or health-check failures.
- Failed deployments now attempt timed cleanup to remove partial releases.
- Releases now include discovery labels and randomized suffixes to avoid name collisions.
Bug Fixes
- Uninstall now finds and removes all matching releases, reducing leftovers.

coderabbitai · 2026-03-20T02:20:22Z

Walkthrough

Adds centralized debug-log capture and automated cleanup to CLM deploy scripts. Introduces DEBUG_LOG_DIR (default: ${PROJECT_ROOT}/.debug-work) and CLI flag --debug-log-dir. Adds capture_debug_logs(namespace, selector, component_name, output_dir) which collects pod logs, descriptions, events, workloads and services into a timestamped directory. Installer flows (adapter, api, sentinel) call capture_debug_logs on Helm failures or health-check failures and then attempt helm uninstall <release> -n <ns> --wait --timeout 5m. Adapter installs now append an 8‑char random suffix to release names, add Helm labels (adapter-resource-type, adapter-name), and uninstalls query releases by those labels (with a prefix fallback).

Sequence Diagram

sequenceDiagram
    actor User
    participant DeployScript as Deploy Script
    participant Helm
    participant Kubernetes
    participant DebugLogs as Debug Log Capture

    User->>DeployScript: Run install (optional --debug-log-dir)
    DeployScript->>Helm: helm upgrade --install <release>-<rand>
    
    alt Helm install succeeds
        Helm-->>DeployScript: Release created
        DeployScript->>Kubernetes: Run health check probe
        alt Health check passes
            Kubernetes-->>DeployScript: Healthy
            DeployScript-->>User: Installation complete
        else Health check fails
            Kubernetes-->>DeployScript: Unhealthy
            DeployScript->>DebugLogs: capture_debug_logs(namespace, selector, release, DEBUG_LOG_DIR)
            DebugLogs->>Kubernetes: kubectl logs/describe/events/workloads/services...
            Kubernetes-->>DebugLogs: Diagnostics collected
            DebugLogs-->>DeployScript: Logs saved
            DeployScript->>Helm: helm uninstall <release>-<rand> -n <ns> --wait --timeout 5m
            Helm-->>DeployScript: Uninstall result
            DeployScript-->>User: Installation failed
        end
    else Helm install fails
        Helm-->>DeployScript: Install failed
        DeployScript->>Helm: helm list -n <ns> --selector adapter-resource-type=...,adapter-name=... -q
        alt Matching releases found
            Helm-->>DeployScript: Releases listed
            DeployScript->>DebugLogs: capture_debug_logs(namespace, selector, release, DEBUG_LOG_DIR)
            DebugLogs->>Kubernetes: kubectl logs/describe/events/workloads/services...
            DebugLogs-->>DeployScript: Logs saved
            DeployScript->>Helm: helm uninstall <matching-release> -n <ns> --wait --timeout 5m
            Helm-->>DeployScript: Uninstall result(s)
        else No labeled releases
            DeployScript->>Helm: fallback: list by release-name prefix
            Helm-->>DeployScript: Releases listed
            DeployScript->>Helm: helm uninstall <matching-release> -n <ns> --wait --timeout 5m
        end
        DeployScript-->>User: Installation failed
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly identifies the main change: improving E2E CI test deployment logic with reference to the ticket. However, it is somewhat generic and doesn't specifically convey that the changes involve debug logging capture, random Helm release suffixes, and deployment cleanup improvements.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

deploy-scripts/deploy-clm.sh (1)

445-459: Debug log preservation logic may fail if DEBUG_LOG_DIR is outside WORK_DIR.

If a user specifies --debug-log-dir /var/log/hyperfleet (a path outside WORK_DIR), this preservation logic would unnecessarily move logs to a temp directory and back. Worse, if the custom path doesn't exist initially, the check passes but subsequent operations might fail.

Consider adding a check to only preserve when DEBUG_LOG_DIR is a subdirectory of WORK_DIR:

♻️ Proposed fix

     # Clean up work directory (but preserve debug logs)
     if [[ "${DRY_RUN}" == "false" && "${VERBOSE}" == "false" ]]; then
         log_verbose "Cleaning up work directory"
         # Preserve debug logs if they exist
-        if [[ -d "${DEBUG_LOG_DIR}" ]]; then
+        if [[ -d "${DEBUG_LOG_DIR}" && "${DEBUG_LOG_DIR}" == "${WORK_DIR}"/* ]]; then
             local temp_debug_dir
             temp_debug_dir=$(mktemp -d)
             mv "${DEBUG_LOG_DIR}" "${temp_debug_dir}/debug-logs" 2>/dev/null || true
             rm -rf "${WORK_DIR}"
             mkdir -p "${WORK_DIR}"
             mv "${temp_debug_dir}/debug-logs" "${DEBUG_LOG_DIR}" 2>/dev/null || true
             rm -rf "${temp_debug_dir}"
         else
             rm -rf "${WORK_DIR}"
         fi
     fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@deploy-scripts/deploy-clm.sh` around lines 445 - 459, The preservation logic
for DEBUG_LOG_DIR may move or touch paths outside WORK_DIR; update the cleanup
block to first resolve paths (e.g., realpath) and only perform the temp-preserve
dance when DEBUG_LOG_DIR exists and its resolved path is under WORK_DIR's
resolved path (starts-with check); if DEBUG_LOG_DIR is outside WORK_DIR or
doesn't exist, skip moving it and simply rm -rf "${WORK_DIR}" as before;
reference the DEBUG_LOG_DIR and WORK_DIR variables and the existing
temp_debug_dir usage when making this change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@deploy-scripts/deploy-clm.sh`:
- Around line 445-459: The preservation logic for DEBUG_LOG_DIR may move or
touch paths outside WORK_DIR; update the cleanup block to first resolve paths
(e.g., realpath) and only perform the temp-preserve dance when DEBUG_LOG_DIR
exists and its resolved path is under WORK_DIR's resolved path (starts-with
check); if DEBUG_LOG_DIR is outside WORK_DIR or doesn't exist, skip moving it
and simply rm -rf "${WORK_DIR}" as before; reference the DEBUG_LOG_DIR and
WORK_DIR variables and the existing temp_debug_dir usage when making this
change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: de36bfd4-cfa9-4f04-a8bc-5b0f18af6067

📥 Commits

Reviewing files that changed from the base of the PR and between f73bb04 and 1ae2387.

📒 Files selected for processing (5)

deploy-scripts/deploy-clm.sh
deploy-scripts/lib/adapter.sh
deploy-scripts/lib/api.sh
deploy-scripts/lib/common.sh
deploy-scripts/lib/sentinel.sh

yingzhanredhat · 2026-03-20T05:45:01Z

/test lint

yasun1 · 2026-03-20T06:26:58Z

Code review

Found 1 issues:

Race condition in debug log directory preservation - The cleanup logic uses a complex move-delete-recreate-move sequence that can silently lose debug logs if any step fails. Both mv commands use || true which suppresses errors, and the temp directory gets deleted regardless of whether the move-back succeeded.

hyperfleet-e2e/deploy-scripts/deploy-clm.sh

Lines 446 to 460 in 1ae2387

    
           if [[ "${DRY_RUN}" == "false" && "${VERBOSE}" == "false" ]]; then 
        
               log_verbose "Cleaning up work directory" 
        
               # Preserve debug logs if they exist 
        
               if [[ -d "${DEBUG_LOG_DIR}" ]]; then 
        
                   local temp_debug_dir 
        
                   temp_debug_dir=$(mktemp -d) 
        
                   mv "${DEBUG_LOG_DIR}" "${temp_debug_dir}/debug-logs" 2>/dev/null || true 
        
                   rm -rf "${WORK_DIR}" 
        
                   mkdir -p "${WORK_DIR}" 
        
                   mv "${temp_debug_dir}/debug-logs" "${DEBUG_LOG_DIR}" 2>/dev/null || true 
        
                   rm -rf "${temp_debug_dir}" 
        
               else 
        
                   rm -rf "${WORK_DIR}" 
        
               fi 
        
           fi

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deploy-scripts/lib/common.sh`:
- Around line 195-201: The debug-log capture currently allows empty WORK_DIR and
ignores failures from directory creation and capture commands; update the logic
around the output_dir and capture steps so failures are propagated: validate
that WORK_DIR (used when computing output_dir) is non-empty before defaulting,
check the return code of mkdir -p for "${output_dir}" and exit non-zero with an
error log via log_section/processLogger on failure, and similarly add explicit
error checks for the capture commands referenced around lines 259-264 so any
failed capture causes the function to log the error and return a non-zero exit
code instead of silently succeeding.
- Around line 202-205: The current timestamp used to build log_prefix (variables
timestamp and log_prefix, referencing output_dir and component_name) has only
second-level granularity and can collide; update timestamp generation to be
collision-resistant by including higher-resolution time and a unique process
identifier (e.g., use date +"%Y%m%d-%H%M%S-%N" and append $$ or similar) or use
a safe unique generator (mktemp/uuid) and then rebuild
log_prefix="${output_dir}/${component_name}-${timestamp}" so concurrent runs
cannot overwrite each other.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3d925e03-58fb-4db5-bc75-505579580ac6

📥 Commits

Reviewing files that changed from the base of the PR and between 1ae2387 and 4558a57.

📒 Files selected for processing (5)

deploy-scripts/deploy-clm.sh
deploy-scripts/lib/adapter.sh
deploy-scripts/lib/api.sh
deploy-scripts/lib/common.sh
deploy-scripts/lib/sentinel.sh

🚧 Files skipped from review as they are similar to previous changes (4)

deploy-scripts/deploy-clm.sh
deploy-scripts/lib/api.sh
deploy-scripts/lib/sentinel.sh
deploy-scripts/lib/adapter.sh

deploy-scripts/lib/common.sh

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deploy-scripts/deploy-clm.sh`:
- Around line 101-103: The help text for the --debug-log-dir flag is
inconsistent with the actual default
(DEBUG_LOG_DIR="${DEBUG_LOG_DIR:-${PROJECT_ROOT}/.debug-work}"); update the
documented default in the option/usage text for --debug-log-dir to
"${PROJECT_ROOT}/.debug-work" so the help matches the implementation (or
alternatively change the DEBUG_LOG_DIR assignment to use ${WORK_DIR}/debug-logs
if you prefer the documented path); ensure you modify the help string that
references ${WORK_DIR}/debug-logs and keep the flag name --debug-log-dir and
variable DEBUG_LOG_DIR in sync.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 446ca419-15e2-44b0-98da-c957b9c3b2ba

📥 Commits

Reviewing files that changed from the base of the PR and between 4558a57 and bec5898.

📒 Files selected for processing (5)

deploy-scripts/deploy-clm.sh
deploy-scripts/lib/adapter.sh
deploy-scripts/lib/api.sh
deploy-scripts/lib/common.sh
deploy-scripts/lib/sentinel.sh

🚧 Files skipped from review as they are similar to previous changes (3)

deploy-scripts/lib/api.sh
deploy-scripts/lib/common.sh
deploy-scripts/lib/sentinel.sh

deploy-scripts/deploy-clm.sh

deploy-scripts/lib/adapter.sh

coderabbitai

🧹 Nitpick comments (1)

deploy-scripts/lib/adapter.sh (1)

110-112: Inefficient /dev/urandom read pattern.

head /dev/urandom reads until a newline character, but /dev/urandom rarely produces newlines, causing it to buffer a large chunk (up to 64KB) before piping to tr. This wastes entropy and CPU cycles.

♻️ Proposed fix

   # Generate random suffix to prevent namespace conflicts
   local random_suffix
-  random_suffix=$(head /dev/urandom | LC_ALL=C tr -dc 'a-z0-9' | head -c 8)
+  random_suffix=$(LC_ALL=C tr -dc 'a-z0-9' < /dev/urandom | head -c 8)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@deploy-scripts/lib/adapter.sh` around lines 110 - 112, Replace the unbounded
"head /dev/urandom" pattern used to generate random_suffix with a fixed-byte
read from /dev/urandom so you only consume the entropy you need; update the
random_suffix assignment (the random_suffix variable initialization) to read a
small, fixed number of bytes (e.g., one block) from /dev/urandom and then filter
to [a-z0-9] and cut to 8 characters, rather than piping an open-ended head, to
avoid buffering a large chunk and wasting CPU/entropy.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@deploy-scripts/lib/adapter.sh`:
- Around line 110-112: Replace the unbounded "head /dev/urandom" pattern used to
generate random_suffix with a fixed-byte read from /dev/urandom so you only
consume the entropy you need; update the random_suffix assignment (the
random_suffix variable initialization) to read a small, fixed number of bytes
(e.g., one block) from /dev/urandom and then filter to [a-z0-9] and cut to 8
characters, rather than piping an open-ended head, to avoid buffering a large
chunk and wasting CPU/entropy.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7f8e5c39-9c43-4c14-b084-787bdb685d6d

📥 Commits

Reviewing files that changed from the base of the PR and between bec5898 and e17776f.

📒 Files selected for processing (5)

deploy-scripts/deploy-clm.sh
deploy-scripts/lib/adapter.sh
deploy-scripts/lib/api.sh
deploy-scripts/lib/common.sh
deploy-scripts/lib/sentinel.sh

🚧 Files skipped from review as they are similar to previous changes (1)

deploy-scripts/lib/sentinel.sh

rafabene · 2026-03-23T13:07:32Z

/lgtm

openshift-ci · 2026-03-23T13:07:38Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rafabene

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [rafabene]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

yingzhanredhat requested a review from yasun1 March 20, 2026 02:20

openshift-ci bot requested review from crizzo71 and rafabene March 20, 2026 02:20

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

yingzhanredhat force-pushed the hyperfleet-752 branch from 1ae2387 to 4558a57 Compare March 20, 2026 08:09

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

deploy-scripts/lib/common.sh Outdated Show resolved Hide resolved

deploy-scripts/lib/common.sh Show resolved Hide resolved

yingzhanredhat force-pushed the hyperfleet-752 branch from 4558a57 to bec5898 Compare March 20, 2026 08:27

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

deploy-scripts/deploy-clm.sh Show resolved Hide resolved

rafabene reviewed Mar 20, 2026

View reviewed changes

deploy-scripts/lib/adapter.sh Show resolved Hide resolved

rafabene reviewed Mar 20, 2026

View reviewed changes

deploy-scripts/lib/adapter.sh Outdated Show resolved Hide resolved

HYPERFLEET-752 | ci: Improve E2E CI Test deployment logic

e17776f

yingzhanredhat force-pushed the hyperfleet-752 branch from bec5898 to e17776f Compare March 23, 2026 02:30

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

openshift-ci bot assigned rafabene Mar 23, 2026

openshift-ci bot added the lgtm label Mar 23, 2026

openshift-ci bot added the approved label Mar 23, 2026

openshift-merge-bot bot merged commit 00fc4eb into openshift-hyperfleet:main Mar 23, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HYPERFLEET-752 | ci: Improve E2E CI Test deployment logic#51

HYPERFLEET-752 | ci: Improve E2E CI Test deployment logic#51
openshift-merge-bot[bot] merged 1 commit intoopenshift-hyperfleet:mainfrom
yingzhanredhat:hyperfleet-752

yingzhanredhat commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

yingzhanredhat commented Mar 20, 2026

Uh oh!

yasun1 commented Mar 20, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

rafabene commented Mar 23, 2026

Uh oh!

openshift-ci bot commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yingzhanredhat commented Mar 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

yingzhanredhat commented Mar 20, 2026

Uh oh!

yasun1 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code review

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

rafabene commented Mar 23, 2026

Uh oh!

openshift-ci bot commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yingzhanredhat commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

yasun1 commented Mar 20, 2026 •

edited

Loading